In this lab, we will first examine the various method for constructing a spatial weight matrix, then go through the process of analyzing and modeling datasets of areal data. Before starting the lab, you will need to set up a new folder for your working directory. Go to your geog6000 folder now and create a new folder for today’s class called lab12. The following files will be used in this lab, all available on Canvas:
download these files from Canvas, and move them from your Downloads folder to the datafiles folder that you made previously. Make sure to unzip the zip files so that R can access the content. Note that on Windows, you will need to right-click on the file and select ‘Extract files’.
Now start RStudio and change the working directory to lab12. As a reminder, you can do this by going to the [Session] menu in RStudio, then [Change working directory]. This will open a file browser that you can use to browse through your computer and find the folder.
You will also need several packages to carry out all the steps listed here: sf, spdep and spatialreg. Make sure these are installed (if you haven’t already) before starting. I’m going to use tmap to map out some of the results, but you are welcome to use ggplot2 or the basic plot() command.
pkgs <- c("sf",
"spdep",
"spatialreg",
"tmap")
install.packages(pkgs)
library(RColorBrewer)
library(sf)
library(spdep)
library(spatialreg)
library(tmap)
With all the examples given, it is important to not just type in the code, but to try changing the parameters and re-running the functions multiple times to get an idea of their effect. Help on the parameters can be obtained by typing help(functionname) or ?functionname.
We’ll start by using sf to read in and plot the geometries of the Boston housing dataset and the New York dataset using sf.
boston <- st_read("../datafiles/boston.tr/boston.shp", quiet = TRUE)
plot(st_geometry(boston))
NY8 <- st_read("../datafiles/NY_data/NY8_utm18.shp", quiet = TRUE)
plot(st_geometry(NY8))
As a reminder, you can see the attribute table for the polygons by simply typing the name of the sf object, or by using the $ notation to extract individual variables:
head(NY8)
## Simple feature collection with 6 features and 17 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 421423.4 ymin: 4661351 xmax: 427091.6 ymax: 4664822
## Projected CRS: WGS 84 / UTM zone 18N
## AREANAME AREAKEY X Y POP8 TRACTCAS PROPCAS
## 1 Binghamton city 36007000100 4.069397 -67.3533 3540 3.08 0.000870
## 2 Binghamton city 36007000200 4.639371 -66.8619 3560 4.08 0.001146
## 3 Binghamton city 36007000300 5.709063 -66.9775 3739 1.09 0.000292
## 4 Binghamton city 36007000400 7.613831 -65.9958 2784 1.07 0.000384
## 5 Binghamton city 36007000500 7.315968 -67.3183 2571 3.06 0.001190
## 6 Binghamton city 36007000600 8.558753 -66.9344 2729 1.06 0.000388
## PCTOWNHOME PCTAGE65P Z AVGIDIST PEXPOSURE Cases Xm Ym
## 1 0.3277311 0.1466102 0.14197 0.2373852 3.167099 3.08284 4069.397 -67353.3
## 2 0.4268293 0.2351124 0.35555 0.2087413 3.038511 4.08331 4639.371 -66861.9
## 3 0.3377396 0.1380048 -0.58165 0.1708548 2.838229 1.08750 5709.063 -66977.5
## 4 0.4616048 0.1188937 -0.29634 0.1406045 2.643366 1.06515 7613.831 -65995.8
## 5 0.1924370 0.1415791 0.45689 0.1577753 2.758587 3.06017 7315.968 -67318.3
## 6 0.3651786 0.1410773 -0.28123 0.1726033 2.848411 1.06386 8558.753 -66934.4
## Xshift Yshift geometry
## 1 423391.0 4661502 POLYGON ((421840.4 4662874,...
## 2 423961.0 4661993 POLYGON ((421826.4 4663108,...
## 3 425030.6 4661878 POLYGON ((423159 4664759, 4...
## 4 426935.4 4662859 POLYGON ((425261.8 4663493,...
## 5 426637.5 4661537 POLYGON ((425033.3 4662995,...
## 6 427880.3 4661921 POLYGON ((426178.8 4664216,...
hist(NY8$TRACTCAS,
xlab = "Number of cases/tract")
We can also make thematic maps by specifying the column we wish to plot:
my.pal <- brewer.pal(9, "YlOrRd")
plot(NY8["Cases"],
main = "New York Leukemia Cases",
col = my.pal)
In order to test the different methods shown in class, we will extract the polygons for the Syracuse area from the NY8 dataset. We then get their geometry using st_geometry(), and their centroids with the st_centroid() function. We’ll use both of these to visualize the neighborhood structure.
Syracuse <- NY8[NY8$AREANAME == "Syracuse city", ]
Syracuse.geom <- st_geometry(Syracuse)
Syracuse.coords <- st_centroid(Syracuse.geom)
plot(Syracuse.geom, reset = FALSE)
plot(Syracuse.coords, pch = 16, col = 2, add = TRUE)
The following section lists various methods for producing neighborhood structures in R. Functions to plot these are only shown for the first - you are encouraged to try plotting several of these to understand which areas are considered as neighbors.
Here two regions are considered neighbors if their boundaries or corners touch, and can be found using the poly2nb() function:
Sy1_nb <- poly2nb(Syracuse)
To visualize the resulting structure, plot the geometry of the original polygons, then add the structure - this requires the neighborhood, the centroids, and various color and line width options:
plot(Syracuse.geom, reset = FALSE)
plot(Sy1_nb, Syracuse.coords, add = TRUE, col = 2, lwd = 1.5)
Use this code to visualize the following structures as well.
Here two regions are considered neighbors only if a contiguous part of their boundaries touch.
Sy2_nb <- poly2nb(Syracuse, queen = FALSE)
Here a triangulation is built between sets of three points. A set of three points is joined as long as no other points are found in a circle fit to the three points:
Sy3_nb <- tri2nb(Syracuse.coords)
The sphere-of-influence method restricts the links formed by Delauney triangulation to a certain length.
Sy4_nb <- graph2nb(soi.graph(Sy3_nb, Syracuse.coords))
Here each region is linked to its \(k\) nearest neighbors irrespective of the distance between them
Sy5_nb <- knn2nb(knearneigh(Syracuse.coords, k = 1))
Sy6_nb <- knn2nb(knearneigh(Syracuse.coords, k = 2))
Distance functions link together two regions if their centroids are within a certain distance of each other. This requires two parameters: d1, the minimum distance, and d2, the maximum distance. Here we will set the maximum to 75% of the largest distance obtained when using 2 nearest neighbors. We obtain all the distances between nearest neighbor pairs in the first line of code (this extracts distances as a list with nbdists() and converts them into a vector with unlist()). Then we find the maximum of these distances. Finally we set d2 to this value \(*0.75\).
dists <- nbdists(Sy6_nb, Syracuse.coords)
dists <- unlist(dists) #list to vector
max_1nn <- max(dists)
Sy7_nb = dnearneigh(Syracuse.coords, d1 = 0, d2 = 0.75 * max_1nn)
While these function provide a quick way to establish a neighborhood structure, they will likely include some connections that are unrealistic or exclude some real connections. It is possible to edit the neighborhood structure directly - this is simply a list of \(n\) vectors, where each vector contains the neighbors linked to a single polygon. To see the contents, simply type:
str(Sy1_nb)
And to access the first polygons neighbors:
Sy1_nb[[1]]
## [1] 2 5 11 21 22
While editing by hand is possible, it is not easy. A simpler way is to use the edit.nb() function that allows interactive editing of a neighborhood structure. Instructions for using this are given in the appendix below.
The calculation of spatial weights from the neighborhood function can be done in several ways. In the following code, the first method does no standardization (binary weights) and the second does row standardization, so that for any given area, each neighbor is downweighted by the total number of neighbors. The function we use is nb2listw() which returns the sparse matrix giving the links between neighbors and the weight of this link. Non-neighbors get a weight of zero.
Sy1_lw_B <- nb2listw(Sy1_nb, style = 'B')
Sy1_lw_W <- nb2listw(Sy1_nb, style = 'W')
A more sophisticated method would be to use some knowledge of the dataset. Weights could be assigned by an inverse distance method, where closer neighbors have higher weights. To do this, we need the list of distances along each join for the neighborhood structure that we will use. In this case, we use the first one generated, the Queen’s case. First, we extract the distances from the neighborhood list (nbdists()), to obtain a list of distances. We then generate inverse distances by dividing by 1000 (to make these a little more manageable) and taking the reciprocal. This is a little complex as we obtain a list output from nbdists(). We create a new function, invd(), which calculates the inverse distance in kilometers (the coordinates are in meters). Then we apply this function to each item in the list using lapply(). This is the list version of the function apply() that we have used earlier.
dists <- nbdists(Sy1_nb, Syracuse.coords)
inverse_distance <- function(x) {1/(x/1000)}
idw <- lapply(dists, inverse_distance)
Sy1_lw_idwB <- nb2listw(Sy1_nb, glist = idw, style = "B")
From here on, we will concentrate on analyzing the Boston housing price data set. The median house price values that we wish to model have a skewed distribution (look at a histogram to see this), so we will create a new variable, containing log house prices:
boston$logCMEDV <- log(boston$CMEDV)
hist(boston$logCMEDV)
plot(boston["logCMEDV"])
We’ll also plot the map of this variable using tmap:
tm_shape(boston) +
tm_fill("logCMEDV") +
tm_borders()
We also make up a neighborhood structure, using the Queen’s case contiguity method
boston.nb <- poly2nb(boston, queen = TRUE)
coords <- st_centroid(st_geometry(boston))
plot(st_geometry(boston), reset = FALSE)
plot(boston.nb, coords, add = TRUE, col = "gray")
And finally convert this to a spatial weight matrix
boston.listw = nb2listw(boston.nb)
We will start by looking for global autocorrelation in the housing price data. Use the moran.test() function for this. Note we set the randomization assumption to be true as we know nothing about the larger spatial trends in this dataset. The function requires the variable and a spatial weight matrix. Look for the value of Moran’s \(I\), and the \(z\)-score (standard deviate) and associated \(p\)-value:
moran.test(boston$logCMEDV,
listw = boston.listw,
alternative = "two.sided",
randomisation = TRUE)
##
## Moran I test under randomisation
##
## data: boston$logCMEDV
## weights: boston.listw
##
## Moran I statistic standard deviate = 27.06, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.7270399706 -0.0019801980 0.0007258269
Purely for the sake of comparison, we’ll also calculate Moran’s \(I\) under the normality assumption, stating that the pattern of house prices in Boston is a subset of a larger spatial trend. You should see a slight change in the variance calculated, but not enough to affect our conclusion that the data are strongly autocorrelated.
moran.test(boston$logCMEDV,
listw = boston.listw,
alternative = "two.sided",
randomisation = FALSE)
##
## Moran I test under normality
##
## data: boston$logCMEDV
## weights: boston.listw
##
## Moran I statistic standard deviate = 27.038, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.7270399706 -0.0019801980 0.0007270156
We can also calculate the significance of the observed autocorrelation using a Monte Carlo method. Here, we randomly redistribute the values across the locations several hundred times, recalculating Moran’s \(I\) each time. Once done, we look at the rank of the observed version of Moran’s \(I\) against those obtained from random resampling. If we are either at the high or low end of all these random realizations, it is highly likely that the observed distribution is significantly autocorrelated. As we are using a rank, we cannot use a two-sided test, but must specify if we believe the autocorrelation to be positive (‘greater’) or negative (‘less’). The number of simulations to run is given by the parameter nsim. Increasing this, will increase the precision of the \(p\)-value obtained (but take longer to run):
moran.mc(boston$logCMEDV,
listw = boston.listw,
nsim = 999,
alternative = 'greater')
##
## Monte-Carlo simulation of Moran I
##
## data: boston$logCMEDV
## weights: boston.listw
## number of simulations + 1: 1000
##
## statistic = 0.72704, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater
Now make the Moran scatterplot. This shows the relationship between the value of any given area and its neighbors. The slope of the fitted line is the value of Moran’s \(I\):
moran.plot(boston$logCMEDV,
boston.listw,
labels = as.character(boston$ID),
xlab = "Log Median Value",
ylab = "Lagged Log Median Value")
We will now calculate the local Moran’s \(I\) to examine the spatial pattern of autocorrelation. This returns a statistic (and \(z\)-score) for each area considered.
lm1 = localmoran(boston$logCMEDV,
listw = boston.listw,
alternative = "two.sided")
head(lm1)
## Ii E.Ii Var.Ii Z.Ii Pr(z != E(Ii))
## 1 -0.224510839 -2.873020e-04 0.017914316 -1.67525561 0.093884090
## 2 -0.001798936 -2.175813e-05 0.002735961 -0.03397629 0.972896060
## 3 0.137484323 -9.177414e-05 0.005723568 1.81848429 0.068990146
## 4 0.033966647 -8.277718e-05 0.010408122 0.33375177 0.738566878
## 5 0.491120158 -4.043873e-04 0.022365621 3.28665963 0.001013833
## 6 0.002533256 -2.287544e-05 0.001909980 0.05848826 0.953359715
The results may be extracted and plotted. The \(z\)-scores are in the fourth column of the output, and the \(p\)-values in the fifth column:
boston$zscore <- lm1[,4]
boston$pval <- lm1[,5]
Now let’s plot these results. First the \(z\)-scores:
tm_shape(boston) +
tm_fill("zscore", palette = "Reds", style = "jenks", n = 6) +
tm_borders() +
tm_layout(main.title = "Local Moran's I (z-scores)",
legend.position = c("left", "bottom"))
For the \(p\)-values, we’ll convert these into a binary vector, with ones if the \(p\)-value is below 0.05, zeros otherwise. Plotting these shows clearly the areas with high autocorrelation, by the harbour and to the west of the city.
boston$pval.bin <- as.factor(ifelse(boston$pval < 0.05, "Significant", "Not-significant"))
tm_shape(boston) +
tm_fill("pval.bin") +
tm_borders() +
tm_layout(main.title = "Local Moran's I (z-scores)",
main.title.size = 1,
legend.position = c("left", "bottom"))
Finally, we will use the Getis-Ord \(G^*\) statistic to look at local variation in values, but identifying regions with clusters of high or low values. The \(G^*\) statistic includes the value of the location of interest as well as the neighbors. To account for this, we need first to make a new spatial weight matrix which includes a weight for the link of a location to itself (with the include.self() function). The statistic is calculated using the localG() function, which takes as input, the variable of interest and the spatial weight matrix.
boston.listwGs <- nb2listw(include.self(boston.nb), style = "B")
boston$lG <- as.numeric(localG(boston$logCMEDV, boston.listwGs))
The output is the \(G^*\) statistic, converted to a \(z\)-score. Now we plot the results.
tm_shape(boston) +
tm_fill("lG", palette = "RdYlBu", style = "jenks", n = 6) +
tm_borders() +
tm_layout(main.title = "Local Getis-Ord G(*) hot/cold spots",
main.title.size = 1,
legend.position = c("left", "bottom"))
The results show a very clear structure with clusters of low prices in the center of Boston, but interrupted by a cluster of higher prices running along the river to the harbor. Other clusters, but of high values, are found in the suburbs to the west.
boston$lG.sig <- ifelse(boston$lG < -1.96,
-1,
ifelse(boston$lG > 1.96, 1, 0))
boston$lG.sig <- factor(boston$lG.sig, labels=c("Cold", "Not-significant", "Hot"))
tm_shape(boston) +
tm_fill("lG.sig", palette = "-RdYlBu", style = "jenks", n = 6) +
tm_borders() +
tm_layout(main.title = "Local Getis-Ord G(*) hot/cold spots",
main.title.size = 1,
legend.position = c("left", "bottom"))
The spatialreg package has functions for building spatial regression models. We will build several spatial regression models using the crime dataset from Columbus. The dataset is available as a shapefile in columbus.zip - download and extract this before starting.
col <- st_read("../datafiles/columbus/columbus.shp", quiet = TRUE)
Our goal here is to model the crime rate using information about household income (INC) and price (HOVAL). As both of these are right skewed, we’ll log-transform them for the analysis:
col$lINC <- log(col$INC)
col$lHOVAL <- log(col$HOVAL)
CRIME) by adapting the code given above to plot the NY and Boston datasets.Now build the neighborhood structure and the associated spatial weight matrix. The example below uses the Queen’s case definition of neighbors and binary weights, but these can be replaced by other options fairly easily.
col.geom <- st_geometry(col)
col.coords <- st_centroid(col.geom)
col.nbq <- poly2nb(col)
plot(col.geom, reset = FALSE)
plot(col.nbq, col.coords, add = TRUE)
Now convert this to a spatial weight matrix:
col.listw <- nb2listw(col.nbq)
Use the code given above to look for spatial autocorrelation in the CRIME variable, using both Global and Local Moran’s \(I\).
Next, we make an OLS regression, excluding all spatial information:
col.fit1 <- lm(CRIME ~ lINC + lHOVAL, data = col)
summary(col.fit1)
##
## Call:
## lm(formula = CRIME ~ lINC + lHOVAL, data = col)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.361 -5.997 -1.521 7.432 28.013
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 138.074 14.542 9.495 2.06e-12 ***
## lINC -19.272 5.024 -3.836 0.000379 ***
## lHOVAL -14.924 4.568 -3.267 0.002059 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.6 on 46 degrees of freedom
## Multiple R-squared: 0.5392, Adjusted R-squared: 0.5191
## F-statistic: 26.91 on 2 and 46 DF, p-value: 1.827e-08
The model appears to be a fairly good one, with an \(F\)-statistic that is significant. However, given the pattern in the crime data seen on the map above, it is likely that the model may be impacted by autocorrelation, so we’ll now test for autocorrelation in the residuals, using Moran’s \(I\):
moran.mc(residuals(col.fit1),
listw = col.listw,
nsim = 999)
##
## Monte-Carlo simulation of Moran I
##
## data: residuals(col.fit1)
## weights: col.listw
## number of simulations + 1: 1000
##
## statistic = 0.21308, observed rank = 998, p-value = 0.002
## alternative hypothesis: greater
I’ve used the Monte Carlo version of Moran’s \(I\), but this could be replaced by the other approaches, depending on your assumptions about the spatial pattern.
The Lagrange multiplier test is used to assess whether the autocorrelation is in the values of the dependent variable or in its errors, and helps in the choice of which spatial regression model to use. We first run this test, using the non-robust version. The tests are given by the parameter test - the full range can be found in the help page for the function.
lmt <- lm.LMtests(col.fit1, col.listw, test = c("LMerr","LMlag"))
summary(lmt)
## Lagrange multiplier diagnostics for spatial dependence
## data:
## model: lm(formula = CRIME ~ lINC + lHOVAL, data = col)
## weights: col.listw
##
## statistic parameter p.value
## LMerr 4.7913 1 0.028603 *
## LMlag 9.1694 1 0.002461 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Higher values of the statistic indicate a more likely source of correlation. Where both are significant, the robust test (RLMerr and RLMlag) should be used to decide. These robust tests account for autocorrelation in one term, then test for remaining autocorrelation in the other term.
lmt_robust <- lm.LMtests(col.fit1, col.listw, test = c("RLMerr","RLMlag"))
summary(lmt_robust)
## Lagrange multiplier diagnostics for spatial dependence
## data:
## model: lm(formula = CRIME ~ lINC + lHOVAL, data = col)
## weights: col.listw
##
## statistic parameter p.value
## RLMerr 0.024805 1 0.87485
## RLMlag 4.402809 1 0.03588 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Evidence for significant autocorrelation is shown in the RLMlag test, suggesting a spatial lag model.
A spatial lag model can be fit to the data using the lagsarlm() function. The syntax follows that of most modeling functions in R, except that we need to give it a spatial weight matrix:
col.fit2 <- lagsarlm(CRIME ~ lINC + lHOVAL,
data = col,
col.listw)
summary(col.fit2)
##
## Call:lagsarlm(formula = CRIME ~ lINC + lHOVAL, data = col, listw = col.listw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36.1856 -5.4709 -0.5156 6.4413 22.5983
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 99.5700 16.1538 6.1639 7.099e-10
## lINC -11.9221 4.5544 -2.6177 0.0088524
## lHOVAL -13.5984 3.9965 -3.4026 0.0006676
##
## Rho: 0.42092, LR test value: 8.9447, p-value: 0.0027828
## Asymptotic standard error: 0.12671
## z-value: 3.3219, p-value: 0.00089398
## Wald statistic: 11.035, p-value: 0.00089398
##
## Log likelihood: -183.6196 for lag model
## ML residual variance (sigma squared): 100.73, (sigma: 10.036)
## Number of observations: 49
## Number of parameters estimated: 5
## AIC: 377.24, (AIC for lm: 384.18)
## LM test for residual autocorrelation
## test value: 0.028921, p-value: 0.86496
There are several things to note in the output:
rho value, showing the strength and significance of the autoregressive spatial componentAs with the previous model, we can test the residuals for any remaining autocorrelation:
moran.mc(residuals(col.fit2),
listw = col.listw,
nsim = 999)
##
## Monte-Carlo simulation of Moran I
##
## data: residuals(col.fit2)
## weights: col.listw
## number of simulations + 1: 1000
##
## statistic = 0.010517, observed rank = 641, p-value = 0.359
## alternative hypothesis: greater
While the results of the Lagrange multiplier test indicated a spatial lag model as the most suitable method, we can also fit a spatial error model, using the errorsarlm() function:
col.fit3 = errorsarlm(CRIME ~ lINC + lHOVAL,
data = col,
col.listw)
summary(col.fit3)
##
## Call:errorsarlm(formula = CRIME ~ lINC + lHOVAL, data = col, listw = col.listw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.3722 -6.3766 -0.1172 6.9844 21.9846
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 117.7761 14.7937 7.9612 1.776e-15
## lINC -9.9844 4.9172 -2.0305 0.042305
## lHOVAL -16.0041 4.1761 -3.8323 0.000127
##
## Lambda: 0.53456, LR test value: 6.9048, p-value: 0.0085966
## Asymptotic standard error: 0.14043
## z-value: 3.8066, p-value: 0.00014091
## Wald statistic: 14.49, p-value: 0.00014091
##
## Log likelihood: -184.6396 for error model
## ML residual variance (sigma squared): 101.71, (sigma: 10.085)
## Number of observations: 49
## Number of parameters estimated: 5
## AIC: 379.28, (AIC for lm: 384.18)
In the output of the function, note the value of lambda, the autoregressive coefficient representing the strength of autocorrelation in the residuals of a linear model. Note the AIC and compare to the previous model.
So far, we have only considered correlation between values of the dependent variable in any zone and it’s neighbors. An alternative source of spatial dependency might arise between values of a dependent variable and neighboring values of an independent variable. To model this, we can use a spatial Durbin model, that includes lagged independent variables in the neighboring zones. This also uses the lagsarlm() function, but with the parameter type set to ‘mixed’, to specify a Spatial Durbin lag model:
col.fit4 = lagsarlm(CRIME ~ lINC + lHOVAL,
data = col,
col.listw,
type = 'mixed')
summary(col.fit4)
##
## Call:lagsarlm(formula = CRIME ~ lINC + lHOVAL, data = col, listw = col.listw,
## type = "mixed")
##
## Residuals:
## Min 1Q Median 3Q Max
## -36.42025 -5.66457 0.10208 5.61973 21.82351
##
## Type: mixed
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 95.4310 31.8319 2.9980 0.002718
## lINC -9.6490 4.9167 -1.9625 0.049702
## lHOVAL -15.2748 4.1030 -3.7229 0.000197
## lag.lINC -13.6313 8.4749 -1.6084 0.107743
## lag.lHOVAL 11.8547 8.2710 1.4333 0.151776
##
## Rho: 0.35363, LR test value: 3.4569, p-value: 0.062987
## Asymptotic standard error: 0.16701
## z-value: 2.1174, p-value: 0.034225
## Wald statistic: 4.4834, p-value: 0.034225
##
## Log likelihood: -182.0123 for mixed model
## ML residual variance (sigma squared): 95.663, (sigma: 9.7807)
## Number of observations: 49
## Number of parameters estimated: 7
## AIC: 378.02, (AIC for lm: 379.48)
## LM test for residual autocorrelation
## test value: 0.14333, p-value: 0.705
The output gives the coefficients for all variables, including the lagged versions. Are any of these significant?
tax_charge) in the 48 conterminous U.S. states for the period 1955-1959, as well as the average used car price for 1960 (price_1960). Use this dataset to build a spatial model linking used car prices to the tax and delivery costs. You will need the spdep library.
summary() function to get information about the distribution of links per state and report thismoran.test() or moran.mc() function to test for spatial autocorrelation in the prices of used cars. Report the Moran’s \(I\) statistic, the z-score and whether or not you can reject the null hypothesis that there is no spatial autocorrelationlm() function between the used car prices (dependent or Y variable) and the tax and delivery cost (independent or X variable). Report the coefficients of the model and the \(R^2\). Check for autocorrelation in the residuals of the model using the moran.mc() function, and state whether or not this is presentlagsarlm() or errorsarlm() function). Report the following information:
moran.test() or moran.mc() function. Given the value you obtain, state whether you think that the model adequately accounts for autocorrelation?The spdep package includes a function for interactive editing of neighborhood structures. However, this does not currently work in RStudio due to issues with resizing the plotting window. Instead, you will need to run this from the base R application. As an example, open R (not RStudio!) from your application menu or start menu. Once open, use the ‘Misc’ menu and choose ‘Change working directory’ to browse to the lab files. Once there, load the NY data set, create the Syracuse subset and make a neighborhood structure:
library(sf)
library(spdep)
library(sp)
NY8 <- st_read("../datafiles/NY_data/NY8_utm18.shp")
Syracuse <- NY8[NY8$AREANAME == "Syracuse city", ]
Sy1_nb <- poly2nb(Syracuse)
Note that when you make a plot in base R, it opens a separate window to display this. Now, let’s edit this structure. This fucntion does not yet work with sf objects, so we’ll need to first convert the Syracuse using the sp library. Run the following code:
Syr2 <- as_Spatial(Syracuse)
coords <- coordinates(Syr2)
Sy2_nb <- edit.nb(Sy1_nb, coords, polys = Syr2)
This will display the polygons, centroids and neighborhood structure, and the console will display “Identifying contiguity for deletion”. You can now do two things:
Delete this line (y/n)”. Enter y to do so, or n to ignore this and move onAdd contiguity (y/n)”. Enter y to do so, or n to ignore this and move onEach time you add or delete a link, the function will ask if you want to refresh, continue or quit. Continuing simply waits for you to select two more centroids. If you refresh, it will redraw the neighborhood structure highlighting which links have been deleted (dashed line) or added (yellow line). If you quit, then the interactive session will terminate and the new neighborhood structure will be written out. In the code above, we assign the output of the function to Sy2_nb. If you now plot this and the previous one, you should see the edits you have made. This new neighborhood structure can then be used in all the subsequent functions (Moran’s \(I\), spatial regression, etc.). In order to keep and re-use this structure in RStudio, make sure to save it. As the neighborhood object is somewhat complicated, the easiest way to do this is to write it to an R binary file (this saves both neighborhood structures to a single file):
save(Sy1_nb, Sy2_nb, file="Sy_nb.RData")
You can then load this in a new R or RStudio session, which will recreate these objects
load(file="Sy_nb.RData")